__One, No one and One Hundred Thousand: the paradigm of the Z-R relationship__

__import packages__

In [1]:
import sys
import pandas as pd 
import numpy as np
import math
import copy as cp

import sys
sys.path.append("..") # Adds higher directory to python modules path.

import disdrorain as dr
# import disdrorain_wspeed as drsp

__load data into disdrorain class object__

In [2]:
_RD80_ = dr.disdrorain(datapath='../data/RD80_sample_data') # by default disdrometer class limits are RD80 classes
_RD69_ = dr.disdrorain(datapath='../data/RD69_sample_data',classpath='../data/RD69_celllimits') # load RD69 data with RD69 classe limits

let's have a look at those class limits

In [3]:
_RD80_.classlimits
Out[3]:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
class borders
left 0.313 0.405 0.505 0.596 0.715 0.827 0.999 1.232 1.429 1.582 1.748 2.077 2.441 2.727 3.011 3.385 3.704 4.127 4.573 5.145
right 0.405 0.505 0.596 0.715 0.827 0.999 1.232 1.429 1.582 1.748 2.077 2.441 2.727 3.011 3.385 3.704 4.127 4.573 5.145 5.601
In [4]:
_RD69_.classlimits
Out[4]:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
class borders
left 0.310 0.408 0.506 0.597 0.715 0.827 1.000 1.233 1.430 1.583 1.749 2.077 2.441 2.729 3.013 3.388 3.706 4.130 4.575 5.144
right 0.408 0.506 0.597 0.715 0.827 1.000 1.233 1.430 1.583 1.749 2.077 2.441 2.729 3.013 3.388 3.706 4.130 4.575 5.144 5.598

let's have a look to the disrometer counts

In [5]:
_RD80_.data.head(5)
Out[5]:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
record number
0 7 4 17 7 5 9 15 24 12 3 4 2 0 0 0 0 0 0 0 0
1 5 7 9 15 5 9 28 20 15 8 6 0 0 0 0 0 0 0 0 0
2 1 5 14 16 9 27 50 27 14 7 4 0 0 0 0 0 0 0 0 0
3 6 8 12 15 13 32 36 24 13 8 13 0 0 0 0 0 0 0 0 0
4 3 6 14 10 12 31 41 46 12 8 11 4 0 0 0 0 0 0 0 0
In [6]:
_RD69_.data.head(5)
Out[6]:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
record number
0 5 7 8 13 13 18 18 3 0 2 2 1 4 1 3 2 8 6 9 9
1 0 4 3 64 47 130 235 143 103 107 183 177 80 64 61 23 28 9 4 4
2 0 0 0 3 8 46 195 150 110 166 315 248 116 86 55 32 10 3 4 0
3 0 0 24 90 127 294 523 365 236 270 357 245 95 58 32 11 5 0 0 0
4 14 0 16 46 80 278 468 286 219 179 317 222 113 95 68 25 16 5 0 0

__remove outliers count__

Purporse: eliminate outliers counts. </br>

  • We look for the class with maximun count. From this class, we move to the left (right) until a class of null count is found or the minimum (maximum) diameter class is reached: this class is the left most (right most) class.
  • All counts left of the leftmost and right of the right most class are set to zero.
  • E.g. : the following 20 class disdrometer count
      [60 157 121 124 68 67 74 44 14 10 18 11 0 2 0 0 1 0 0 0] is reduced to
      [60 157 121 124 68 67 74 44 14 10 18 11 0 0 0 0 0 0 0 0]
In [7]:
# _RD80_clean and _RD69_clean are two new disdrorain class objects whose "data" have been processed by removing outlier counts
# RD80_summary and RD69_summary are tow data frames reporting the "impact" of the outlier procedure application

(_RD80_clean,RD80_summary) = _RD80_.outlier_deletion()
(_RD69_clean,RD69_summary) = _RD69_.outlier_deletion()

Let's look at how the first record of RD69 as been changed:

In [8]:
print(_RD69_.data.head(1).to_string(index=False))
print(_RD69_clean.data.head(1).to_string(index=False))
 C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14  C15  C16  C17  C18  C19  C20
  5   7   8  13  13  18  18   3   0    2    2    1    4    1    3    2    8    6    9    9
 C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14  C15  C16  C17  C18  C19  C20
  5   7   8  13  13  18  18   3   0    0    0    0    0    0    0    0    0    0    0    0

the counts from C10 to C20 have been set to zero. This mght seems "very bad", but these cases are very rare </br> Let's have a look at the summary data frames

  • The RD69_summary data frame reports, for each record affected by the oulier removal, the number of drop prior and after and percentage of drops removed together with the percentage change.
  • We can use this information to underestand the impact of the outler removal operation.
In [9]:
print("RD69 sample report")
print("")
print("The original number of drops in the RD69  sample is:   ",  _RD69_.data.sum().sum())
print("After removing the outliers, the number of drops is:   ",  _RD69_clean.data.sum().sum())
print("We removed ", (_RD69_.data.sum().sum()-_RD69_clean.data.sum().sum()), " drops: ",\
     round((_RD69_.data.sum().sum()-_RD69_clean.data.sum().sum())/_RD69_.data.sum().sum()*100,2),"% of the original drops")
print("")
print("The number of records affected is: ",RD69_summary.shape[0], \
      " (=",round(RD69_summary.shape[0]/_RD69_.data.shape[0]*100,2), "% of the orginal number of records)")
print("")
print("To better quantify the impact, we can calculate the quantiles of percentage of drops removed for each affected record")
print(RD69_summary['perc_delta_drops'].quantile([0.01,0.05,0.25,0.5,0.95,0.75,0.99]))
RD69 sample report

The original number of drops in the RD69  sample is:    2758320
After removing the outliers, the number of drops is:    2753796
We removed  4524  drops:  0.16 % of the original drops

The number of records affected is:  1091  (= 15.9 % of the orginal number of records)

To better quantify the impact, we can calculate the quantiles of percentage of drops removed for each affected record
0.01   -35.926052
0.05   -17.248284
0.25    -2.666667
0.50    -1.111111
0.95    -0.149319
0.75    -0.578035
0.99    -0.050813
Name: perc_delta_drops, dtype: float64
In [10]:
print("RD80 sample report")
print("")
print("The original number of drops in the RD69  sample is:   ",  _RD80_.data.sum().sum())
print("After removing the outliers, the number of drops is:   ",  _RD80_clean.data.sum().sum())
print("We removed ", (_RD80_.data.sum().sum()-_RD80_clean.data.sum().sum()), " drops: ",\
     round((_RD80_.data.sum().sum()-_RD80_clean.data.sum().sum())/_RD80_.data.sum().sum()*100,2),"% of the original drops")
print("")
print("The number of records affected is: ",RD80_summary.shape[0], \
      " (=",round(RD80_summary.shape[0]/_RD80_.data.shape[0]*100,2), "% of the orginal number of records)")
print("")
print("To better quantify the impact, we can calculate the quantiles of percentage of drops removed for each affected record")
print(RD80_summary['perc_delta_drops'].quantile([0.01,0.05,0.25,0.5,0.95,0.75,0.99]))
RD80 sample report

The original number of drops in the RD69  sample is:    19780517
After removing the outliers, the number of drops is:    19754263
We removed  26254  drops:  0.13 % of the original drops

The number of records affected is:  9197  (= 13.41 % of the orginal number of records)

To better quantify the impact, we can calculate the quantiles of percentage of drops removed for each affected record
0.01   -32.194432
0.05    -8.064516
0.25    -1.639344
0.50    -0.943396
0.95    -0.198413
0.75    -0.497512
0.99    -0.114140
Name: perc_delta_drops, dtype: float64

__calculate (Z,R) and the reduced variable (z,r)__

In [11]:
# calculate bulk variables with the hypothesis that drop speed is a power law : v(D)=3.67*D^0.67
RD80_bulk_vplaw=_RD80_clean.bulk_variables()
RD69_bulk_vplaw=_RD69_clean.bulk_variables()
# calculate bulk variables with the hypothesis that drop speed is an "exponential plateau" : v(D)=9,65-10.3*exp(-0.6*D)
RD80_bulk_vexpo=_RD80_clean.bulk_variables(_speed_='expo')
RD69_bulk_vexpo=_RD69_clean.bulk_variables(_speed_='expo')

# r=R/N z=Z/Nv w=W/Nv
RD80_bulk_vplaw.head(5)
Out[11]:
N Nv R Z W r z w
record number
0 109 106 1.250231 520.994421 138.686532 0.011470 4.926270 1.311353
1 127 119 1.464971 535.769991 165.411313 0.011535 4.516735 1.394477
2 174 158 1.727067 533.244185 205.105277 0.009926 3.370164 1.296289
3 180 168 2.000153 765.963042 226.223811 0.011112 4.558634 1.346372
4 198 175 2.536535 1077.006084 280.224558 0.012811 6.156963 1.601971

show reduced variability for the couple (z,r) with respect to the couple (Z,R)

In [12]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file, output_notebook

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select"

output_notebook() 

p1 = figure( tools=TOOLS, title="RD69 -- v(D)=3.67*D**0.67", plot_width=400, plot_height=400, y_range=(-3,7), x_range=(-4,3))
p1.xaxis.axis_label="(log10(R),log10(r))"
p1.yaxis.axis_label="(log10(Z),log10(z))"
p3 = figure( tools=TOOLS, title="RD80 -- v(D)=3.67*D**0.67" ,plot_width=400, plot_height=400, y_range=(-3,7), x_range=(-4,3))
p3.xaxis.axis_label="(log10(R),log10(r))"
p3.yaxis.axis_label="(log10(Z),log10(z))"


p1.circle(np.log10(RD69_bulk_vplaw.R.values), np.log10(RD69_bulk_vplaw.Z.values), color="red", size=1, legend_label="(log10(Z),log10(R))")
p1.circle(np.log10(RD69_bulk_vplaw.r.values), np.log10(RD69_bulk_vplaw.z.values), color="cyan", size=1, legend_label="(log10(z),log10(r))")
p3.circle(np.log10(RD80_bulk_vplaw.R.values), np.log10(RD80_bulk_vplaw.Z.values), color="red", size=1, legend_label="(log10(Z),log10(R))")
p3.circle(np.log10(RD80_bulk_vplaw.r.values), np.log10(RD80_bulk_vplaw.z.values), color="cyan", size=1, legend_label="(log10(z),log10(r))")

p2 = figure( tools=TOOLS, title="RD69 -- v(D)=9,65-10.3*exp(-0.6*D)", plot_width=400, plot_height=400, y_range=(-3,7), x_range=(-4,3))
p2.xaxis.axis_label="(log10(R),log10(r))"
p2.yaxis.axis_label="(log10(Z),log10(z))"
p4 = figure( tools=TOOLS, title="RD80 -- v(D)=9,65-10.3*exp(-0.6*D)",plot_width=400, plot_height=400, y_range=(-3,7), x_range=(-4,3))
p4.xaxis.axis_label="(log10(R),log10(r))"
p4.yaxis.axis_label="(log10(Z),log10(z))"

p2.circle(np.log10(RD69_bulk_vexpo.R.values), np.log10(RD69_bulk_vexpo.Z.values), color="red", size=1, legend_label="(log10(Z),log10(R))")
p2.circle(np.log10(RD69_bulk_vexpo.r.values), np.log10(RD69_bulk_vexpo.z.values), color="cyan", size=1, legend_label="(log10(z),log10(r))")
p4.circle(np.log10(RD80_bulk_vexpo.R.values), np.log10(RD80_bulk_vexpo.Z.values), color="red", size=1, legend_label="(log10(Z),log10(R))")
p4.circle(np.log10(RD80_bulk_vexpo.r.values), np.log10(RD80_bulk_vexpo.z.values), color="cyan", size=1, legend_label="(log10(z),log10(r))")

p1.legend.location = "top_left"
p3.legend.location = "top_left"
p2.legend.location = "top_left"
p4.legend.location = "top_left"


grid = gridplot([[p1, p2,],[p3, p4]])
show(grid)
Loading BokehJS ...

__Approximate Universal scaling ralation for (z,r)__

  • $z=C \zeta_{p} r^2$ $\Leftrightarrow$ $\zeta_{p}= C \frac{z}{r^{2}}$
  • $C = \frac{A_{m} T}{6 \pi 10^-4}$ C is an instrumentation factor
In [13]:
C=pow((0.005*60)/(6*math.pi*0.0001),2) 
RD69_bulk_vplaw.loc[:,'zetap']=RD69_bulk_vplaw.z/pow(RD69_bulk_vplaw.r,2)/C
RD80_bulk_vplaw.loc[:,'zetap']=RD80_bulk_vplaw.z/pow(RD80_bulk_vplaw.r,2)/C
RD69_bulk_vexpo.loc[:,'zetap']=RD69_bulk_vexpo.z/pow(RD69_bulk_vexpo.r,2)/C
RD80_bulk_vexpo.loc[:,'zetap']=RD80_bulk_vexpo.z/pow(RD80_bulk_vexpo.r,2)/C
In [14]:
def make_zetap_cdf(_df_, _binisize_):
    _df_.loc[:,'bin'] = np.floor(_df_.zetap/_binisize_)*_binisize_
    _A_ = _df_.groupby('bin')['N'].count().to_frame().reset_index()
    _A_.rename(columns={'N':'count'},inplace=True)
    _A_.loc[:,'cdf'] = _A_['count'].cumsum()/_A_['count'].sum()
    _A_.loc[-1] = [_A_['bin'].min()-_binisize_, 0, 0] 
    _A_.index = _A_.index + 1  # shifting index
    _A_= _A_.sort_index()  # sorting by index
    
    return _A_


RD69_zetap_cdf_vplaw = make_zetap_cdf(RD69_bulk_vplaw,0.025)
RD80_zetap_cdf_vplaw = make_zetap_cdf(RD80_bulk_vplaw,0.025)
RD69_zetap_cdf_vexpo = make_zetap_cdf(RD69_bulk_vexpo,0.025)
RD80_zetap_cdf_vexpo = make_zetap_cdf(RD80_bulk_vexpo,0.025)

Plotting survival distribution for $\zeta_{p}$

In [15]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file, output_notebook

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select"

output_notebook() 

p1 = figure( tools=TOOLS, title="RD69", plot_width=500, plot_height=400)
p2 = figure( tools=TOOLS, title="RD80", plot_width=500, plot_height=400)

p1.line(RD69_zetap_cdf_vplaw.loc[RD69_zetap_cdf_vplaw.cdf<1,:].bin.values,np.log10(1-RD69_zetap_cdf_vplaw.loc[RD69_zetap_cdf_vplaw.cdf<1,:].cdf.values), \
        color="burlywood", line_width=4, legend_label="v(D)=3.67*D**0.67")
p1.line(RD69_zetap_cdf_vexpo.loc[RD69_zetap_cdf_vexpo.cdf<1,:].bin.values,np.log10(1-RD69_zetap_cdf_vexpo.loc[RD69_zetap_cdf_vexpo.cdf<1,:].cdf.values), \
        color="darkturquoise", line_width=4, legend_label="v(D)=9,65-10.3*exp(-0.6*D)")


p2.line(RD80_zetap_cdf_vplaw.loc[RD80_zetap_cdf_vplaw.cdf<1,:].bin.values,np.log10(1-RD80_zetap_cdf_vplaw.loc[RD80_zetap_cdf_vplaw.cdf<1,:].cdf.values), \
        color="burlywood", line_width=4, legend_label="v(D)=3.67*D**0.67")
p2.line(RD80_zetap_cdf_vexpo.loc[RD80_zetap_cdf_vexpo.cdf<1,:].bin.values,np.log10(1-RD80_zetap_cdf_vexpo.loc[RD80_zetap_cdf_vexpo.cdf<1,:].cdf.values), \
        color="darkturquoise", line_width=4, legend_label="v(D)=9,65-10.3*exp(-0.6*D)")

p1.xaxis.axis_label="zeta_p"
p1.yaxis.axis_label="1-CDF(zeta_p)"
p2.xaxis.axis_label="zeta_p"
p2.yaxis.axis_label="1-CDF(zeta_p)"

grid = gridplot([[p1, p2,],])
show(grid)
Loading BokehJS ...
In [ ]: